New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Universal parser] Introduce rb_parser_ary_t #10004
Conversation
bc35630
to
5c1b674
Compare
node.c
Outdated
@@ -483,7 +511,7 @@ rb_ast_delete_mark_object(rb_ast_t *ast, VALUE obj) | |||
VALUE | |||
rb_ast_tokens(rb_ast_t *ast) | |||
{ | |||
return ast->node_buffer->tokens; | |||
return (VALUE)ast->node_buffer->tokens; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is rb_parser_ary_t
compatible with VALUE
?
This function is used in the implementation of RubyVM::AbstractSyntaxTree::Node#all_tokens
, and the return value will be seen in Ruby level.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I reimplemented rb_ast_tokens()
that constructs RArray from rb_parser_ary_t 8a986a2
parse.y
Outdated
p->token_id++; | ||
|
||
if (p->debug) { | ||
rb_parser_printf(p, "Append tokens (line: %d) %"PRIsVALUE"\n", line, ary); | ||
rb_parser_printf(p, "Append tokens (line: %d) %"PRIsVALUE"\n", line, (VALUE)token); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
PRsVALUE
calls to_s
method on the corresponding argument.
rb_parser_ast_token_t
does not seem to compatible with VALUE
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you for the review!
I implemented rb_parser_str_escape()
which does not depend on VALUE
15e42bc
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@nobu Could you please review them again?
4cad33b
to
8a986a2
Compare
parse.y
Outdated
ary->data = xrealloc(ary->data, sizeof(void *) * len); | ||
} | ||
for (i = ary->len; i < len; i++) { | ||
ary->data[i] = Qnil; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not Qnil
but 0
(or NULL
)?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fixed in hasumikin@9c64383
parse.y
Outdated
ary->data = NULL; | ||
} | ||
for (i = 0; i < len; i++) { | ||
ary->data[i] = Qnil; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not Qnil
but 0
(or NULL
)?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fixed in hasumikin@9c64383
parse.y
Outdated
rb_parser_ary_push(rb_parser_t *p, rb_parser_ary_t *ary, VALUE val) | ||
{ | ||
if (ary->len == ary->capa) { | ||
rb_parser_ary_extend(p, ary, ary->len + 1); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Expanding the array with ary->len + 1
is no efficient especially for ast token array because it's expected to be expanded again and again. How about to double the capacity?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fixed: hasumikin@d1fd5ba
BTW, what does the comment // TODO
in rb_ruby_parser_keep_tokens()
mean?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It might to be "Remove dependency for Ruby Array", then please remove the // TODO
comment in this PR, thank you.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Remove the line hasumikin@968c063
node.c
Outdated
parser_ast_token_free(rb_ast_t *ast, rb_parser_ast_token_t *token) | ||
{ | ||
if (!token) return; | ||
parser_string_free(ast, (rb_parser_string_t *)token->str); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this cast for rb_parser_string_t *
needed?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fixed in hasumikin@9c64383
node.c
Outdated
@@ -483,7 +523,32 @@ rb_ast_delete_mark_object(rb_ast_t *ast, VALUE obj) | |||
VALUE | |||
rb_ast_tokens(rb_ast_t *ast) | |||
{ | |||
return ast->node_buffer->tokens; | |||
long i; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How about to move this ruby object construction logic to ast.c
? By the change, node.c no need to handle VALUE.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, I thought that too but,
If we move the logic from node.c to ast.c, do you also think we should redesign ast->node_buffer->tokens
into ast->tokens
because we can say the tokens
member no longer belongs to a Node?
(Of course, we cannot change the Ruby level design: AbstractSyntaxTree::Node
though)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In a long term, it's correct however It's impossible right now. rb_ast_t
is imemo (internal ruby object) now therefore it's size should be 5 words. It's already uses 5 words like below. We can refactor it once rb_ast_t
is not managed as imemo
.
typedef struct rb_ast_body_struct {
const NODE *root;
VALUE script_lines;
// script_lines is either:
// - a Fixnum that represents the line count of the original source, or
// - an Array that contains the lines of the original source
signed int frozen_string_literal:2; /* -1: not specified, 0: false, 1: true */
signed int coverage_enabled:2; /* -1: not specified, 0: false, 1: true */
} rb_ast_body_t;
typedef struct rb_ast_struct {
VALUE flags;
node_buffer_t *node_buffer;
rb_ast_body_t body;
} rb_ast_t;
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah, I see
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I did
hasumikin@89667cd
rb_obj_freeze(tokens); | ||
rb_ast_set_tokens(p->ast, tokens); | ||
p->ast->node_buffer->tokens = tokens; | ||
p->tokens = NULL; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could you add free for rb_ruby_parser_free
just in case, if an exception happens in the middle of parsing, this line can be skipped.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this OK? hasumikin@c2d3fc8 (I amend a function name parser_ary_free()
to parser_token_free()
for clarity)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
parser_token_free
is preferred, BTW the existence of ast
and ast->node_buffer
is confirmed by rb_ast_free
. Therefore it's more simple to call parser_tokens_free
directly in rb_node_buffer_free
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You are right. Fixed hasumikin@e9e3ac6
parse.y
Outdated
p->token_id++; | ||
|
||
if (p->debug) { | ||
rb_parser_printf(p, "Append tokens (line: %d) %"PRIsVALUE"\n", line, ary); | ||
rb_parser_printf(p, "Append tokens (line: %d) [%d, :%s, \"%s\", [%d, %d, %d, %d]]\n", | ||
line, token->id, token->type_name, rb_parser_str_escape(p, token->str)->ptr, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I guess rb_parser_string_t
allocated in rb_parser_str_escape
is not freed.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks. Fixed in hasumikin@edd7c6d
10e95cd
to
e9e3ac6
Compare
parse.y
Outdated
for (i = 0; i < len; i++) { | ||
ary->data[i] = 0; | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
xcalloc
zero-fills as well as the standard calloc
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you. Fixed
rubyparser.h
Outdated
* Array-like object for parser | ||
*/ | ||
typedef struct rb_parser_ary { | ||
uintptr_t *data; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why this data
is a pointer to uintptr_t
instead of rb_parser_ast_token_t
?
Also val
of rb_parser_ary_push
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We plan to have rb_parser_ary store elements of arbitrary types. However, after talking to Kaneko-san, we decided to limit it to the rb_parser_ast_token_t*
type for now.
85ff09c
Thank you for the review!
b5fffa1
to
f12fb8d
Compare
- Introduce `rb_parser_ary_t` structure to partly eliminate RArray from parse.y - In this patch, `parser_params->tokens` and `parser_params->ast->node_buffer->tokens` are now `rb_parser_ary_t *` - Instead, `ast_node_all_tokens()` internally creates a Ruby Array object from the `rb_parser_ary_t` - Also, delete `rb_ast_tokens()` and `rb_ast_set_tokens()` in node.c - Implement `rb_parser_str_escape()` - This is a port of the `rb_str_escape()` function in string.c - `rb_parser_str_escape()` does not depend on `VALUE` (RString) - Instead, it uses `rb_parser_stirng_t *` - This function works when --dump=y option passed - Because WIP of the universal parser, similar functions like `rb_parser_tokens_free()` exist in both node.c and parse.y. Refactoring them may be needed in some way in the future - Although we considered redesigning the structure: `ast->node_buffer->tokens` into `ast->tokens`, we leave it as it is because `rb_ast_t` is an imemo. (We will address it in the future)
Thank you! |
This PR is part of the work to make Lrama-generating-parse.c a universal parser
rb_parser_ary_t
struct to partly eliminate RArray from parse.yparser_params->tokens
andparser_params->ast->node_buffer->tokens
becomesrb_parser_ary_t *
ast_node_all_tokens()
internally creates a Ruby Array object from therb_parser_ary_t *tokens